Open source clustering software.
نویسندگان
چکیده
SUMMARY We have implemented k-means clustering, hierarchical clustering and self-organizing maps in a single multipurpose open-source library of C routines, callable from other C and C++ programs. Using this library, we have created an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. In addition, we generated a Python and a Perl interface to the C Clustering Library, thereby combining the flexibility of a scripting language with the speed of C. AVAILABILITY The C Clustering Library and the corresponding Python C extension module Pycluster were released under the Python License, while the Perl module Algorithm::Cluster was released under the Artistic License. The GUI code Cluster 3.0 for Windows, Macintosh and Linux/Unix, as well as the corresponding command-line program, were released under the same license as the original Cluster code. The complete source code is available at http://bonsai.ims.u-tokyo.ac.jp/mdehoon/software/cluster. Alternatively, Algorithm::Cluster can be downloaded from CPAN, while Pycluster is also available as part of the Biopython distribution.
منابع مشابه
Hierarchical Clustering Based Automatic Refactorings Detection
The structure of software systems is subject of many changes during the systems lifecycle. A continuous improvement of the software systems structure can be made using refactoring, that assures a clean and easy to maintain software structure. In this paper we are focusing on the problem of restructuring object oriented software systems using hierarchical clustering. We propose two hierachical c...
متن کاملCARP: Software for Fishing Out Good Clustering Algorithms
This paper presents the CLUSTERING ALGORITHMS’ REFEREE PACKAGE or CARP, an open source GNU GPL-licensed C package for evaluating clustering algorithms. Calibrating performance of such algorithms is important and CARP addresses this need by generating datasets of different clustering complexity and by assessing the performance of the concerned algorithm in terms of its ability to classify each d...
متن کاملA K-Means Based Clustering Approach for Finding Faulty Modules in Open Source Software Systems
Prediction of fault-prone modules provides one way to support software quality engineering. Clustering is used to determine the intrinsic grouping in a set of unlabeled data. Among various clustering techniques available in literature K-Means clustering approach is most widely being used. This paper introduces K-Means based Clustering approach for software finding the fault proneness of the Obj...
متن کاملProcessing and Analyisis of Biomedical Nonlinear Signals by Data Mining Methods
The paper demonstrates a nonlinear signal processing method based on an approach found in intelligent data mining. ECG signals were used as an interesting and readily available representative nonlinear domain. These signals were fed in an innovative software platform for feature extraction based on chaos theory. The resultant files were loaded into an open source machine learning software for c...
متن کاملRefactorings Detection Using Hierarchical Clustering
Refactoring is a process that helps to maintain the internal software quality, during the whole software lifecycle. This paper aims at introducing a new hierarchical clustering algorithm that can be used for improving software systems design, by identifying the appropriate refactorings. The algorithm is named HARD (Hierarchical Clustering Algorithm for Refactorings Determination) and uses a new...
متن کاملCombining Clustering and Classification for Software Quality Evaluation
Source code and metric mining have been used to successfully assist with software quality evaluation. This paper presents a data mining approach which incorporates clustering Java classes, as well as classifying extracted clusters, in order to assess internal software quality. We use Java classes as entities and static metrics as attributes for data mining. We identify outliers and apply K-mean...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 20 9 شماره
صفحات -
تاریخ انتشار 2004